# Image to Text

Vit Gpt2 Image Captioning
Apache-2.0
This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text
V
aryan083
31
0
Bpe Vocab N OCR
Apache-2.0
Bpe-vocab-n-OCR is an advanced text extraction tool based on OCR, optimized for generating structured and tokenized output.
Image-to-Text Transformers Supports Multiple Languages
B
prithivMLmods
76
4
BLIP Radiology Model
An image-to-text model based on the transformers library, supporting the conversion of image content into descriptive text.
Image-to-Text Transformers
B
motheecreator
152
0
OCR TextInput Base
A specialized image-to-text model for the financial domain, supporting English text recognition, primarily used for processing image content in financial documents.
Text Recognition Transformers English
O
rohit5895
31
0
Trocr Base Finetune Numbers
TrOCR is a Transformer-based optical character recognition model designed to extract text content from images.
Image-to-Text Transformers English
T
ANANDHU-SCT
23
0
Trocr Sinhala
This model is a fine-tuned version of Microsoft's TrOCR printed text model, specifically designed for Sinhala OCR recognition tasks.
Text Recognition Transformers Other
T
Ransaka
66
1
Ocrmnist
Apache-2.0
An optical character recognition model based on Hugging Face Transformers, specifically designed for recognizing MNIST-style digit images
Text Recognition Transformers English
O
vanshp123
16
0
Trocr Base Printed Captcha Ocr
A captcha recognition model fine-tuned based on Microsoft's trocr-base-printed model, specifically designed for OCR tasks involving printed text
Text Recognition Transformers
T
chanelcolgate
33
1
Image Caption Using ViT GPT2
Apache-2.0
This is an image captioning model based on Vision Transformer (ViT) and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text Transformers
I
Ayansk11
15
1
Trocr Base Fa V2
This is a Transformer-based OCR model specifically designed for recognizing Persian text in images.
Text Recognition Other
T
hezarai
64
3
Manga Ocr Base
Apache-2.0
Optical Character Recognition model specialized for Japanese text in manga
Text Recognition Transformers Japanese
M
TareHimself
96
1
Donut Base Sroie
MIT
A model fine-tuned on an image folder dataset based on naver-clova-ix/donut-base, with no specific use case explicitly stated
Text Recognition Transformers
D
iamkhadke
13
0
Hdd Words Ocr
An OCR model for Hebrew image-to-text conversion, capable of recognizing Hebrew text in images.
Text Recognition Transformers Other
H
sivan22
25
0
Pix2struct Docvqa Base
Apache-2.0
Pix2Struct is an image encoder-text decoder model trained on image-text pairs, supporting various tasks including image captioning and visual question answering.
Image-to-Text Transformers Supports Multiple Languages
P
google
8,601
37
Donut Base Sroie
MIT
This model is a fine-tuned version of naver-clova-ix/donut-base on an image folder dataset, suitable for document understanding tasks.
Text Recognition Transformers
D
unstructuredio
31
1
Ko Trocr Base Nsmc News Chatbot
MIT
This is a proof-of-concept model for Korean text recognition, trained on the TrOCR architecture, supporting Korean text extraction from images.
Image-to-Text Transformers Korean
K
daekeun-ml
44
10
Donut Base Sroie
MIT
A document understanding model fine-tuned based on philschmid/donut-base-sroie
Text Recognition Transformers
D
Prem11100
13
0
Donut Base Medical Handwritten Prescriptions Information Extraction
MIT
A fine-tuned Donut model for extracting text information from handwritten medical prescription images.
Image-to-Text Transformers
D
mjawadazad2321
71
1
Donut Base Sroie
MIT
A document understanding model fine-tuned from naver-clova-ix/donut-base, suitable for image text extraction tasks
Text Recognition Transformers
D
philschmid
185
3
Trocr Base Printed
A branch model based on microsoft/trocr-base-printed, specializing in OCR tasks for printed text.
Text Recognition
T
philschmid
14
2
Doctr Torch Crnn Mobilenet V3 Large French
An optical character recognition (OCR) model based on TensorFlow 2 and PyTorch, supporting multilingual text detection and recognition
Text Recognition Transformers Supports Multiple Languages
D
Felix92
33
3
Vit Gpt2 Image Captioning
Apache-2.0
This is an image captioning model based on ViT and GPT2 architectures, capable of generating natural language descriptions for input images.
Image-to-Text Transformers
V
nlpconnect
939.88k
887
Trocr Base Stage1
TrOCR is a Transformer-based pretrained optical character recognition model developed by Microsoft, suitable for single-line text image OCR tasks.
Image-to-Text Transformers
T
microsoft
18.74k
13
Vit2distilgpt2
MIT
This is an image-to-text generation model capable of receiving images and outputting descriptive text.
Image-to-Text Transformers English
V
sachin
49
8
Trocr Small Stage1
TrOCR is a Transformer-based pre-trained optical character recognition model that adopts an encoder-decoder architecture, suitable for OCR tasks on single-line text images.
Image-to-Text Transformers
T
microsoft
3,713
12
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase